Secondary Connectives in the Prague Dependency Treebank
نویسندگان
چکیده
The paper introduces a new annotation of discourse relations in the Prague Dependency Treebank (PDT), i.e. the annotation of the so called secondary connectives (mainly multiword phrases like the condition is, that is the reason why, to conclude, this means etc.). Firstly, the paper concentrates on theoretical introduction of these expressions (mainly with respect to primary connectives like and, but, or, too etc.) and tries to contribute to the description and definition of discourse connectives in general (both primary and secondary). Secondly, the paper demonstrates possibilities of annotations of secondary connectives in large corpora (like PDT). The paper describes general annotation principles for secondary connectives used in PDT for Czech and compares the results of this annotation with annotation of primary connectives in PDT. In this respect, the main aim of the paper is to introduce a new type of discourse annotation that could be adopted also by other languages.
منابع مشابه
Designing CzeDLex - A Lexicon of Czech Discourse Connectives
We present a design for a new electronic lexicon of Czech discourse connectives. The data format and the annotation scheme are based on a study of similar existing resources, and we discuss arguments for choosing the data structure and selecting features of the lexicon entries. A special attention is paid to a consistent encoding of both primary and secondary connectives. The data itself comes ...
متن کاملAnnotation of Discourse Connectives for the Prague Dependency Treebank
The paper presents a preliminary study on discourse connectives (DC) in Czech. Aiming to build a computerized language corpus capturing discourse relations in Czech, we base our observations on current foreign projects with the same purpose. In this study, first, the different methods of linguistic analysis of the discourse structure and discourse connectives are described, next, the nature and...
متن کاملFrom Sentence to Discourse: Building an Annotation Scheme for Discourse Based on Prague Dependency Treebank
The present paper reports on a preparatory research for building a language corpus annotation scenario capturing the discourse relations in Czech. We primarily focus on the description of the syntactically motivated relations in discourse, basing our findings on the theoretical background of the Prague Dependency Treebank 2.0 and the Penn Discourse Treebank 2. Our aim is to revisit the present-...
متن کاملSemi-Automatic Annotation of Intra-Sentential Discourse Relations in PDT
In the present paper, we describe in detail and evaluate the process of semi-automatic annotation of intra-sentential discourse relations in the Prague Dependency Treebank, which is a part of the project of otherwise mostly manual annotation of all (intraand inter-sentential) discourse relations with explicit connectives in the treebank. Our assumption that some syntactic features of a sentence...
متن کاملIntroducing the Prague Discourse Treebank 1.0
We present the Prague Discourse Treebank 1.0, a collection of Czech texts annotated for various discourse-related phenomena "beyond the sentence boundary". The treebank contains manual annotations of (1), discourse connectives, their arguments and senses, (2), textual coreference, and (3), bridging anaphora, all carried out on 50k sentences of the treebank. Contrary to most similar projects, th...
متن کامل